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Abstract — We further study the connection between Algorithmic 
Entropy and Shannon and Renyi Entropies. It is given an example 
for which the difference between the expected value of algo- 
rithmic entropy and Shannon Entropy meets the known upper- 
bound and, for Renyi Entropy, proving that all other values of the 
parameter (q), the same difference can be big. We also prove that 
for a particular type of distributions Shannon Entropy is able to 
capture the notion of computationally accessible information by 
relating it to time-bounded algorithmic entropy. In order to better 
study this unexpected relation it is investigated the behavior of 
the different entropies (Shannon, Renyi and Tsallis) under the 
distribution based on the time-bounded algorithmic entropy. 

I. Introduction 

Algorithmic Entropy, the size of the smallest program that gen- 
erates a string, denoted by K{x), is a rigorous measure of the 
amount of information, or randomness, in an individual object 
X. Algorithmic entropy and Shannon entropy are conceptually 
very different, as the former is based on the size of programs 
and the later in probability distributions. Surprisingly, they are, 
however, closely related. The expectation of the algorithmic 
entropy equals (up to a constant depending on the distribution) 
the Shannon entropy. 

Shannon entropy measures the amount of information in 
situations where unlimited computational power is available. 
However this measure does not provide a satisfactory frame- 
work for the analysis of public key cipher systems which are 
based on the limited computational power of the adversary. 
The public key and the cipher text together contain all the 
Shannon information concerning the plaintext, but the in- 
formation is computationally inaccessible. So, we face this 
intriguing question: what is accessible information? 
By considering the time-bounded algorithmic entropy (length 
of the program limited to run in time t(|a;|)) we can take 
into account the computational difficulty (time) of extracting 
information. Under some computational restrictions on the 
distributions we show (Theorem \T5[ that Shannon entropy 
equals (up to a constant that depends only on the distribution) 
the time-bound algorithmic information. This result partially 
solves, for this type of distributions, the problem of finding a 
measure that captures the notion of computationally accessible 
information. This result is unexpected since it states that for 
the class of probability distribution such that its cumulative 
probability distribution is computable in time t{n), the Shan- 
non entropy captures the notion of computational difficulty of 
extracting information within this time bound. 
With this result in mind we further study the relation of the 
probability distribution based on time-bounded algorithmic 



entropy with several entropy measures (Shannon, Renyi and 
Tsallis). 

II. Preliminaries 
All strings used are elements of E* = {0, 1}*. E" denotes the 
set of strings of length n and | . | denotes the length of a string. 
It is assumed that all strings are ordered by lexicographic 
ordering. When a; — 1 is written, where a: is a string, it means 
the predecessor of x in the lexicographic order The function 
log is the function logj. The real interval between a and b, 
including a and excluding b is represented by [a,b). 

A. Algorithmic Information Theory 

We give essential definitions and basic results which will be 
need in the rest of the paper A more detailed reference is 
[LV97]. The model of computation used is the prefix free 
Turing machine. A set of strings A is prefix-free if no string in 
A is prefix of another string of A. Notice that Kraft inequality 
guarantees that for any prefix-free set A, 



Definition 1. Let U be a fixed prefix free universal Turing 
machine. For any string x € E*, the Kolmogorov complexity 
or algorithmic entropy of x is K(x) = minp{|p| : U{p) = x}. 
For any time constructive t, the t-time-bounded algorithmic 
entropy (or t-time-bounded Kolmogorov complexity) of x G E* 
is, K*{x) — mmp{\p\ : U{p) = x in at most t{\x\) steps}. 

The choice of the universal Turing machine affects the running 
time of a program at most by a logarithmic factor and the 
program length at most a constant number of extra bits. 

Proposition 2. For all x and y we have: 

1) K{x) < K\x) < +0(1); 

2) Kixly) < K{x) + 0{1) and K\x\y) < K*{x) + 0{1); 

Definition 3. A string x is said algorithmic- random or 
Kolmogorov-random if K{x) > \x\. 

A simple counting argument shows the existence of 
algorithmic -random strings of any length. 

Definition 4. A semi-measure over a space X is a function 
/ : X — > [0, 1] such that f{x) < 1. We say that a semi- 

measure is a measure if the equality holds. A semi-measure is 
called constructive if it is semi-computable from below. 

The function m(x) = 2^-^^^^ is a semi-measure which 
is constructible and dominates any other constructive semi- 
measure p ([Lev74] and [Gac74]), in the sense that there is 



a constant = 2-^*^^) such that for all x, m.{x) > c^^{x). 
For this reason, this semi-measure is called universal. Since 
it is natural to consider time bounds on the Kolmogorov 
complexity we can define a time bounded version of m(x). 

Definition 5. The t-time bounded universal distribution, de- 
noted by m* is m*(a;) = c2^^ '^^\ where c is a fixed constant 
such that m*(a;) = 1. 

In [LV97], Claim 7.6.1, the authors prove that m*^'^ dominates 
every distribution /i such that /i*, the cumulative probability 
distribution of /i, is computable in time <(•). 

Tlieorem 6. If /i* is computable in time t{n) then there 
exists a constant c such that, for all x G E*, m"*(")(a;) > 

B. Entropies 

We consider several types of entropies. Shannon information 
theory was introduced in 1948 by C.E. Shannon [Sha48]. 
Information theory quantifies the uncertainty about the results 
of an experiment. It is based on the concept of entropy which 
measures the number of bits necessary to describe an outcome 
from an ensemble. 

Definition 7 (Shannon Entropy [Sha48]). Let X be a finite 
or infinitely countable set and let X be a random variable 
taking values in X with distribution P. The Shannon Entropy 
of random variable X is given by 



if(X) = -^P(x) logP(x). 



The Renyi entropy is a generalization of Shannon entropy. 
Formally the Renyi entropy is defined as follows: 

Definition 8 (Renyi Entropy [Ren61]). Let X be a finite or 
infinitely countable set and let X be a random variable taking 
values in X with distribution P and let a ^ 1 be a positive 
real number The Renyi Entropy of order a of the random 
variable X is defined as: 

It can be shown that lim Ha{X) = H{X). 

a— >1 

Definition 9 (Min-Entropy). Let X be a finite or infinitely 
countable set and let X be a random variable taking values 
in X with distribution P. We define the Min-Entropy of P by: 

Hoo{P) = -logmaxP(x). 
It is easy to see that Hoc{P) — lim Ha{P). 

a— )-oc 

Definition 10 (Tsallis Entropy [Ts88]). Let X be a finite or 
infinitely countable set and let X be a random variable taking 
values in X with distribution P and let a ^ 1 be a positive 



real number The Tsallis Entropy of order a of the random 
variable X is defined as: 
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C. Algorithmic Information vs. Entropy Information 

Given the conceptual differences in the definition of Algo- 
rithmic Information Theory and Information Theory, it is 
surprising that under some weak restrictions on the distribution 
of the strings, they are closely related, in the sense that the 
expectation of the algorithmic entropy equals the entropy of 
the distribution up to a constant that depends only on that 
distribution. 

Tlieorem 11. Let P{x) be a recursive probability distribution. 
Then: 

< ^ P{x)K{x) - H{P) < K{P) 

X 

Proof. (Sketch, see [LV97] for details) The first inequality fol- 
lows directly from the well known Noiseless Coding Theorem, 
that, for this distributions, states 

H{P) <Y,P{x)K{x) 

X 

Since m is universal, P{x) < 2^(^W(a;), for all which is 
equivalent to \ogP{x) < K{P) — K{x). Thus, we have: 

J2 P{x)K{x) - H{P) = Y,{P{x){K{x) + log P{x))) 

X X 

< Y^{P{x){K{x) + K{P) - K{x))) = K{P) □ 

X 

III. Algorithmic Entropy vs. Entropy: How Close? 

Given the surprising relationship between algorithmic entropy 
and entropy, in this section we investigate how close they 
are. We study also the relation between algorithmic entropy 
and Renyi entropy. In particular, we will find the values 
of a for which the same relation as in Theorem [TT] holds 
for the Renyi entropy. We also prove that for a particular 
type of distributions, entropy is able to capture the notion of 
computationally accessible information. 
First we show that the interval [0, K{P)] of the inequaUties 
of Theorem [TT| is tight: 

Proposition 12. There exist distributions P, with K{P) large 
such that: 

1) P{x)K{x) - H{P) = K{P) - 0(1). 



2) ^P{x)K{x)~H{P)^0{l) 



Proof. 1) Fix G S". Consider the following probability 
distribution: 



Pn{x) 



1 \f X — Xq 

otherwise 



Notice that describing the distribution is equivalent to 
describe x^. So, K{Pn) = K{xo) + 0{l). On the other 



hand, Pnix)K{x) - iJ(F„) = K{xo). So, if xq is 
Kolmogorov-random then K{Pn) ~ n. 
2) Let y be a string of length n such that K{y) — n — 0{l) 
and consider the following probability distribution over 
E*: 

{0.?; \f X — Xq 

1 - O.y if x = xi 
otherwise 

where O.y represent the real number between and 1 
which binary representation is y. Notice that we can 
choose Xq and xi such that /^(xo) = K{xi) < c where 
c is a constant that does not depend on n. 
Thus we have: 

a) K{Pn) ~ n, since describing P„ is equivalent to 
describe xo, xi and y; 

b) J2Pn(x)K(x) = {Q.y)K{xo) + (1 - 0.y)K{xi) 

X 

< o.y X c + (1 — O.y) x c — c; 

c) F(P„) = -O.ylog0.y-(l-0.2/)log(l-0.y) < 1 

Thus E Pnix)K{x) - H{Pn) < C « K{Pn) ~ U. 

□ 

Now we address the question if the same relations as in 
Theorem [TT| holds for the Renyi entropy. We show, in fact, 
the Shannon entropy is the "smallest" entropy that verify these 
properties. 

Since, for every < e < 1, 

< H^-e{X) < H{X) < H,+,{X) < Ho{X) 
it follows that 



a>l 



In the next result we show that the inequalities above are, in 
general, false for different values of a. 

Theorem 13. For every A > and a > 1 there exists a 
recursive distribution P such, 

1) J2P{x)K{x)~H^iP)>iK{P)r 

X 

2) ^ P{x)K{x) - H^{P) > K{P) + A 

X 

The proof of this Theorem is similar to the proof of the 
following Corollary: 

Corollary 14. There exists a recursive probability distribution 
P such that: 

1) ^P{x)K{x) - Ha{P) > K{P), where a > 1; 

X 

2) P{x)K{x) - Ha{P) < 0, where a < 1. 

X 

Proof. For x € {0, 1}", consider the following probability 
distribution: 

1/2 ifa; = 0" 

Pn{x) = { 2-" if a; = la;',a;' e {0,1}"-^ 

otherwise 



It is clear that this distribution is recursive. 
1) First observe that 

=-Y,Pnix)l0gPnix) 

X 

2 2" 

n + 1 
^ 2 

Notice also that K{Pn) = 0{logn). 
We want to prove that, for every a > 1, 

(3no)(Vn > no) ^ Pn{x)K{x) ~ ff„(P„) > if (P„) 



Fix a such that a — 1 = -. ^ttt-ft- 

(no -1)1** 



Ho.{Pn) = -^l0gVP„( 

1 — a ^-^ 
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log (—+ 2"-i X — 

^— f log(2("-i)" + 2"-i) - no] 
1 — a V / 

Now we calculate log (2("-i)" + 2"-i). To simplify 
notation consider: 



X — n — 1 

a — 1 + e, with e > 



Thus, 



log (2("-i)" + 2"-i) = log (2^(1+^) + 2^) = 
= log (2^ (2^^ + 1)) 
= a: + log (2^^ + 1) 

Consider S — xe. It is clear that 

2^ = e'"2..- = 1 + In 2 . 5 + + . . . 



then, 

2'' + 1 = 2 + In 2 • <5 - 
and hence. 



(In 2)2 .^2 



/ 



log(2*' + 1) = log 



2 + In 2 • (5 



(In 2)2.^2 



J 



Notice that lim (3 — 0. 

a— i-l 

log(2 + /3) =^ln(2 + /3) 

= ^ln(2(l + ^)) = 
ln2 ^ ^ 2'' 

= IH 1 

21n2 81n2 



Then, 

log(2 + l„2..+ "°^';-^'+.-). 
6 In 2 
2 

So we have: 



(In 2)2 3 (In 2)3 



xe In 2 



log(2- + l) = l + y + ::^(a;e)2 + .. 

which means, 

X + log(2-^ + i)^a:+i + ^ + ^^(xe)^ 

2 o 



Thus 



— (log(2("-i)" + 2"^^) - na) 



a- 1 

n - 1 In 2 



(n - l)2(a - 1) - 



2 8 

Notice that the rest of elements in the series expansion 

Ci(n-l)3(a-l)2 + c2(n-l)4(a-l)3 + ... ,ci,C2 GK 



can be ignored in the Hmit since a — 1 = r — \\i.n ■ 
So, for all n > ng: 

It is known that lim Ha{Pn) = H{Pn)- In fact, we have 

H^{P.,) = H{P,T-'-^{n,-lf-\ 
Now, the first item of the Theorem is proved by contra- 
diction. Assume by contradiction that 

^Pn{x)K{x) - Ha{Pn) < clogn,with c G M 

X 

i.e., for all 71 > jiq 

J2 Pn{x)K{x) - H{Pn) + - 1)° ' < clogn 

Since, ^Pn{x)K{x) - H{Pa) > 0, we would have 

^(n — l)"-2 < clogn, which is impossible for aU n > 
uq. So, we conclude that 



J2 Pn{x)K{x) - H^{Pn) > clog 



n. 



2) Analogous to the proof of the previous item, but now 
fixing a - 1 = -j-g . □ 



If instead of considering K{P) and K{x) in the inequalities 
of Theorem [TT] we use the time bounded version and imposing 
some computational restrictions on the distributions we obtain 
a similar result. Notice that for the class of distributions on 
the following Theorem the entropy equals (up to a constant) 
the time-bounded algorithmic entropy. 



Theorem 15. Let P be a probability distribution such that P*, 
the cumulative probability distribution of P, is computable in 
time t(n). Then: 

< ^P(a;)i^"*("' - H{P) < i^"*(")(F) 

X 

Proof. The first inequality follows directly from Theorem [TTI 
and from the fact that K^x) > K{x). 

By Theorem |6] if P is a probability distribution such that P* 
is computable in time t{n), then for all a; G S" 

K"*(")(x) +logP(x) < K"*^"\P) 

Then, summing over all x we get 

^ P{x) (i^"*(") (x) + log P{x)) < J2 -P(x)i^"*("^ (P) 

X X 

which is equivalent to 

J2 P{x)K"*^"'> {x) - H{P) < K"*^"'> (P) □ 

X 

This result partially solves, for this type of distributions, the 
problem of finding a measure that captures the notion of com- 
putationally accessible information. This is an important open 
problem with applications and consequences in cryptography. 

IV. On the entropy of the time-bounded 

ALGORITHMIC UNIVERSAL DISTRIBUTION 

We now focus our attention on the universal distribution. Its 
main drawback is the fact that it is not computable. In order 
to make it computable, one can impose restrictions on the 
time that a program can use to produce a string obtaining the 
time-bounded universal distribution (m*(a;) = c2^^ '^^^). We 
investigate the behavior of the different entropies under this 
distribution. The proof of the following Theorem uses some 
ideas from [KT]. 

Theorem 16. The Shannon entropy of the distribution m* 
diverges. 

Proof. If X > 2 then f{x) — x2^^ is a decreasing function. 
Let A be the set of strings such that — logm*(a;) > 2. Since 
m* is computable, A is recursively enumerable. Notice also 
that A is infinite and contains arbitrarily large Kolmogorov- 
random strings. 



-m*(a;) log m*(a;) > ^ -m*(a;) log m*(a;) 

xeA 

= ^c2-^'(")(K*(x) -logc) 



xeA 



xeA 



So if we prove that ^ K*{x)2 ^ diverges the result 



follows. 

Assume, by contradiction, that K*{x)2~^ '^^^ <d for 

xeA 

some d G K. Then, considering r{x) = -K*{x)2^^*^'^^ if 



s G A and r{x) — otherwise, we conclude that r is a semi- 
measure. Thus, there exists a constant c' such that, for all x, 
r{x) < c'm(x). Hence, for x G A, we have 

So, K\x) < c'd2^'(=")-^(^). This is a conti-adiction 
since A contains Kolmogorov - random strings of arbitrar- 
ily large size. The contradiction results from assuming that 
K*{x)2~^ '"^^ converges. So, iJ(m*) diverges. □ 

Now we show that, similarly to the behavior of entropy of 
universal distribution, Ta(m') < oo iff a > 1 and Ha{'m*') < 
oo iff a < 1. First obverse that we have the following ordering 
relationship between these two entropies for all probability 
distribution P: 

1) If a > 1, r„(P) < ^— + H^{P); 



If we prove that the series (2 converges, then 



2) If a < 1, r„(P) > 



a - 1 
1 

a- 1 



Theorem 17. Let a ^ 1 be a real computable number. Then 
we have, TQ(m*) < oo ijf a > 1. 

Proof. From Theorem 8 of [KT], it is known that 
(m(x))" converges iff a > 1. Since m* is a probability 

measure there exists a constant A such that, for all x, m*-{x) < 
Am(.T). So, (m*(a;))" < (Am(a;))", which implies that 
(m*(x))" < A" ^ (m(x))", from where we conclude 

that, for a > 1, Ta{m*) converges. 

For a < 1, the proof is analogous to the proof of Theo- 
rem [16] Suppose that ^ (m*(x))" < d for some c? G M. 

Hence, r{x) — — (m*(a;))" is a computable semi-measure. 
Then, there exists a constant r such that for all x e E*, 

r(a;) = i(c2-^*(^))" < t2-^^'=^ which is equivalent to 

— < 2"^ (^)-^(^). For example, if a: is random it follows 

a 

that ^ < 2(""i)l"=l, which is false. □ 
dr 



the Renyi entropy of order 1 + e of m also converges. 

oo 

xSE* n=la;GE" 



EE2 

oo 

^ 2" X 2""-"= X 2 



c+ce 



n=l 



= 2=+^=^ X 



< oo 



2^ - 1 

Now, assume that a < 1. Since the Renyi entropy is non 
increasing with a, for any distribution P we have H{P) < 
Ha{P). So, in particular, iJ(m*) < i?a(m*). As 7?(m*) 
diverges we conclude that the Renyi entropy of order a < 1 
for the time bounded universal distribution diverges. □ 
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Theorem 18. The Renyi entropy of order a of time bounded 
universal distribution converges for a < 1 and diverges if 
a > 1. 



Proof. Consider a = 1 + where e > 0. Since for all x G S*, 
K\x) < \x\ + c then 2-l^l+= < 2-'^*(^). Since f{y) = 
increases in [0,1], it is also true that for all x G E*, 
(2-bl+c)i+e < (2--ff*(^))i+e. So, summing up over all x G 
S* and applying — log we conclude that 



-log^(2--^"(^))i+'^ < -log^(2-l^l+'=)i+^ 

X X 



